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My colleagues have discussed their research and experiences with film tests 

of cognition and memory*. They have shown that film tests may tap unexplored 

cells of Guilford's Structure of Intellect, that film tests are practical to ad-' 

■ ’*! .% M f ’ ^ r ’ • , • . ■ 1 

minister, and that traits measurable by film tests and perhaps only by film tests 

'• - - ' V.i . * ‘ - •’ „ . ■ • . ■ . 

can be replicated. 

*j. ■ j .. . v- r ■ ■ ’ 1 ; - ; 

In this symposium my position Is that of the caboose, and the job of. the . 
man in the caboose is to look ahead at the freight that has gone before, watch 



out: for hot boxes — potential trouble spots, and to stop the train momentarily, 
ij; need be, to attend to the hot boxes. 



In the journey thus far I have, not iced three types of hot boxes that de- 
serve attention by the engineers. 

The first type of hot box is caused by methodological problems in analyzing 
the data. The work by Seibert and Snow (1965) on film testing, affectionately 
called the "Green Report" because of its green cover, used a data matrix with 
100 subjects and 96 variables. A square data matrix capitalizes on chance to a 
groat extent. To obtain data on 96 variables you should have nearly a thousand 



observations. We generally prefer to have several times as many observations as 
variables. Although this point has been made before, the fact that it is some- 



times forgotten indicates that it needs to be made again. 

The second film testing report, called the "Yellow Report" (Seibert, Reid, 
and Snow, 1967) and the McDaniel^Kephart report have a somewhat more comfortable 
ratio of observations to variables: between 4 and 6 to 1. 
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A second methodological problem arises from a failure to use the best tools 
available t© determine reliabiUtieB of experimental tests. Cronbach’a alpha 
serves as an upper estimate of equivalent— form reliability* and it Is easy to do 
by machine. Hell ability can also be computed by analysis of variance procedures. 
Despite the relative ease with which reliability estimates may be obtained, only 
one third of the 96 experimental variables in the "Green Report" have Ruder— 
Richardson estimates. For the remaining 65 variables* the authors give the com- 
munality as the estimate of reliability. One could argue that communal! ties are 
lower-bound and thus conservative estimates of reliability. True. However, the 
communalities that were reported are only estimates of communalities, and so we 
have the situation of trying to estimate a parameter by a statistic that is two 
generations removed. It is like trying to guess what your daughter will look 
like by looking at her grandmother. Why disturb her grandmother when her mother 
is right beside you? The second film testing report, the Yellow Report, does 
show progress since it reports K-R 20' s for 17 of 23 experimental tests. The 
general Ruder- Richards on is, of course, a special case of Cronbach's alpha. The 
Yellow Report also has one test-retest reliability (p.24), A film test called 
Short Term Color Memory I has a test-retest reliability of .82, quite respectable 
for an experimental test. This figure is surprisingly high when you consider how 
it was obtained. Normally, in the determination of a test-retest reliability, 
subjects are given a test, then perform some unrelated task, and then given the 
test again. In this study of film tests, subjects were given this test, then 
had about five hours of similar tests on which to practice, and then were given 
the retest. That Idle retest reliability was as high as .82 despite all this 
practice deserves more emphasis than it received in the report. 

Three of the reliability estimates for the 10 MeDsniel-Kephart film tests 
are too low to be acceptable and most could be improved. The authors are aware 
of this and suggest ways of improvement. McDaniel and Kephart also conducted 
item analyses, but they used another achievement variable to determine high and 
low groups rather than test total score. The use of an outside variable in par- 
titioning subjects for an item analysis has certain problems and I would suggest 
that their final report dispense with that procedure and instead use the total 
score of that particular test when analyzing. 

Some interesting questions arise upon examining the film test data in a 
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multitrait, multimathod (Campbell and Fiske, 1959) perspective. All film in- 
vestigators were conscious of correlations of film tests with other standardized 
or experimental cognitive tests* McDaniel (1971) listed 91 significant correla- 
tions of the ten film tests with subscores of S standaruized achievement and 
ability tests, and pointed out chat these correlations were acceptably low* The 
highest of these 91 correlations was only .55 (Temporal Memory Span with Gates 
Reading speed) ; one fourth of these correlations ware in the 40 f s and the remain- 
der were usually in the 30 1 s. Assuming the 10 film tests are reliable, they are 
not measuring what the standardized achievement tests are measuring- Their shared 
common variance is 25% at the largest and typically is 10%. If the film tests 
were extremely easy, then their smaller range and/or the skewed distribution 
will have shrunk their correlations with these standardized criteria having a 
diff eren t" presumably symmetric—dis tribution- If the correlations remain small 
after test revision, and the McBaniel-Kephart fi* i tests prove reliable, then 
the next steps would be the seeking of instruments that film teste correlate 
negatively with and th& designing of other (non- film) cc-thods to measure and 
support the hypothesized traits. 

Let me close this section with a point of commendation about the Yellow Re- 
port/ Chester Harris (1967) pointed out that if a data matrix is factored by 
two or- three methods rather than by one method, greater faith can be placed ..in 
those factors common to all methods* The Yellow Report used three different 
methods of 11 fact or 11 analysis and noted that all three solutions were nearly the 
sane. 

So much for methodological problems. ’Hie second hot box concerns theo- 
retical problems A strength of all the reports has been the authors' efforts 
to define a trait, and then to design tests to measure that trait. But the 
designing of a test to measure a trait is not always successful. Suppose you 
design a test for memory for figural transformations# If subjects actually pro- 
cess the information pertaining to the test as memory for figural units then the 
test measures memory for figural units, and not memory for figural transformations* 
A factor analysis may not offer wholly decisive results, since other teats for 
figural transformations may be complex and also contain some figural units 






variance. As an example; the time-space translation test presents two colored 
pegs that move across a colored checkerboard. Do subjects remember the move- 
ment of the pegs or do they simply remember the final position of the pegs? 

The criterion task only demands that subjects remember the final position of the 
pegs. Does the test measure time-space translation or does it measure simple 
positional memory? This whole issue of validation can be directed at several 
of the film tests reported. McDaniel and Kephart hypothesized that their ten . 
tests would describe four factors. A Kaiser image analysis (Kaiser, 1963: Reid, 
196S) and a components analysis (little jiffy) on their t^n film tests both re- 
sulted in only two "factors,” not four, and both factors are complex. Although 
their four hypothesized factors have an appealing theoretical rationale, the 

implementation of their four factors into film tests has not been satisfactory 
so far. 

Let me give two more examples of possible theoretical problems before 
proceeding to the third and last hot box. 

Both the Green and Yello Report describe a ” Fleishman esque 11 analysis. Three 
their film tests had items that were presented for various lengths of time. 

The authors found that subjects used a differing proportion of abilities to pro- 
cess information appearing for different lengths of time. Some abilities were 
not used at* all at some time intervals. Unfortunately, the findings in the 
Green Report were not wholly replicated in the Yellow Report. Further, no sat- < 
is factory hypothesis has yet been generated to explain the differences that occur 
across^ intervals as short as tenths of a second. Two things need to be done: 
first, we need to know what proportion of differences across replications are 
due error variance and what are due to true variance. Second, when the amount 
°f true variance is known, a hypothesis should be. generated and tested to ex- 
plain the variation in true variance .across stimuli differing in time interval. 

One of J. J, Gibson’s (1954) more interesting ideas was the Interaction be- 
tween vision and kinesthesia. In a film testing situation, no interaction can 
occur, since the subject's behavior is being sampled at a constant (motionless) 
state. My last theoretical query is: To what extent can the* results from the 

film test studies be generalized to the normal state. o£ an observer in motion.? 

I turn now to the third and last hot box, the hot box of physiological 
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problems of human perception. 



Cara needs to be taken that the observer can see the stimulus. Both the 
Green and Yellow reports indicate that viewing distance has a significant neg- 
ative loading on the serial integration factor* The serial integration factor 
should be replicated in an experimental situation where viewing distance is not 
a critical factor. 

With slow-moving stimuli s sub j ects f eyes move rapidly enough so that detail 
is not lost in peripheral vision* With short, tachis toscopic stimuli, subjects^ 
eyes may not be able to move fast enough to encompass all the stimuli by non- 
peripheral vision. If peripheral" vision is required then as stimulus time in- 

* • * •> * * • : j 

tervals grow shorter, subjects will not obtain information for color and form 
even -hough they may still obtain information requiring visual acuity. For 



example, subjects looking .at a tachis toscopic array, of colors might, not see -all 
the colors, whereas they might’ be able to discern a similar array of broken 

it is not enough to say that the subject is seated 6 feet from the 
screen. The experimenter must give assurance that the angle of view is small 

t' ’J 

enough that, all the stimulus can be seen .non- peripherally, or must demonstrate 
that the subject obtains 'all the essential information by eye— movement . 

Horizontal angle of the viewer from the screen is a third consideration. 



The viewer off to the far right or left of the screen sees a different and dis- 
torted image. Apparently, however, viewers have been sealed within horizontal 
■angle tolerances . • ' > ’ 

Hie findings of the film tests studies remain as interesting as ever. 



More attention should be given to their role in discrimination and prediction. 



of which the McDaniel- Kephart studies are a start, I would like to prophesy 
that unless a practical utility can be found for the film tests, they will ga- 
ther dust and be sold for scrap. The train of film tests has had a few hot 
boxes, but nothing insuperable, I trust that the train will reach a useful 
destination, and not just go down the track, becoming smaller and smaller and 
finally disappear into the sunset. 
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